CGN to Grail: Extracting a Type-logical Lexicon From the CGN Annotation

نویسندگان

  • Michael Moortgat
  • Richard Moot
چکیده

The tag set for the CGN syntactic annotation is designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these frameworks. In this paper we will discuss some preliminary work on the mapping between the CGN annotation graphs and the proof net format of the Grail parser/theorem prover (Moot 2001, Moot 1999). Grail is a general grammar development environment for typelogical categorial grammars (TLG, (Moortgat 1997, Morrill 1994, Carpenter 1998)). To a large extent, there is a straightforward transfer between the type-logical format and the analyses provided by other lexicalized grammar formalisms such as LTAG (lexicalized Tree Adjoining Grammars, (Sarkar 2001)) and MG (computational versions of Minimalist Grammars, (Stabler 1997)). An attractive feature of TLG, which is not shared by these other frameworks, is its full support for hypothetical reasoning. In this paper, we exploit the hypothetical reasoning facilities to extract a type-logical grammar from the CGN annotation graphs. This task can be naturally divided in two subtasks. The first of these consists in solving type equations: in the TLG setting this means breaking up the CGN annotation graph into the subgraphs that correspond to lexical type assignments. In the presence of discontinuous dependencies, the lexical type assignments will not always be compatible with surface word order. The second subtask then consists in calibrating the lexicon in such a way that it has controlled access to the structural reasoning component of the grammar.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Spoken Dutch Corpus for type-logical grammar induction

Abstract The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these framewo...

متن کامل

Carrageenan induces cell cycle arrest in human intestinal epithelial cells in vitro.

Multiple studies in animal models have shown that the commonly used food additive carrageenan (CGN) induces inflammation and intestinal neoplasia. We performed the first studies to determine the effects of CGN exposure on human intestinal epithelial cells (IEC) in tissue culture and tested the effect of very low concentrations (1-10 mg/L) of undegraded, high-molecular weight CGN. These concentr...

متن کامل

Syntactic Annotation for the Spoken Dutch Corpus Project (CGN)

Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constit...

متن کامل

Draft October 2003 3 From recent overviews of annotated

This chapter describes the broad phonemic transcription in the CGN. First a broad overview of phonetic annotations in Dutch corpora is provided and a number of crucial dimensions are discussed: the source of annotation (human or automatic), the type of material involved, the level of transcription and the symbol set and transcription conventions. These dimensions serve as a guide through a numb...

متن کامل

Syntactic Analysis in the Spoken Dutch Corpus (CGN)

The paper describes the syntactic annotation of the Spoken Dutch Corpus (“Corpus Gesproken Nederlands” or CGN), the Dutch-Flemish project (1998-2003) aiming at the collection, description and annotation of ten million words of spoken Dutch. In the first part, the background of the parsing strategy is discussed, as well as some details concerning the actual implementation of the parsing process....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000